Creating the DISEQuA Corpus: a Test Set for Multilingual Question Answering

نویسندگان

  • Bernardo Magnini
  • Simone Romagnoli
  • Alessandro Vallin
  • Jesús Herrera
  • Anselmo Peñas
  • Víctor Peinado
  • M. Felisa Verdejo
  • Maarten de Rijke
چکیده

This paper describes the procedure adopted by the three coordinators of the CLEF 2003 question answering track (ITC-irst, UNED and ILLC) to create the question set for the monolingual tasks. Despite the little resources available, the three groups collaborated and managed to formulate and verify a large pool of original questions posed in three different languages: Dutch, Italian and Spanish. A part of these queries was translated into English and shared between the three coordination groups. Thus, a second cross-verification was conducted, in order to extract the queries that had an answer in all the three monolingual document collections. Finally, the result of the joint efforts was the creation of the DISEQuA (Dutch Italian Spanish English Questions and Answers) corpus, a useful and reusable resource that is freely available for the research community. The article reports on the different stages of the corpus creation, from the monolingual kernels to the multilingual extension.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارایه یک پیکره‌ پرسش و پاسخ مذهبی در زبان فارسی

Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...

متن کامل

The First Cross-Script Code-Mixed Question Answering Corpus

In this paper, we formally introduce the problem of crossscript code-mixed question answering (QA) and we elaborate the corpus acquisition process and an evaluation strategy related to the said problem. Today social media platforms are flooded by millions of posts everyday on various topics. This paper emphasizes the use of such ever growing user generated content to serve as information collec...

متن کامل

Multilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions

In this paper we investigate the effectiveness of a novel resource for Multilingual Question Answering (QA). Such a resource consists of a set of multilingual pattern libraries for answer extraction and validation. In the spirit of the ongoing attempts to develop freely available resources for QA, we argue that the distribution and use of pattern libraries will contribute to make Multilingual Q...

متن کامل

Reflections on TREC QA

The TREC (later TAC) Question Answering track reinvigorated the question answering research community, fostering extensive research on different question types and finding answers in different kinds of corpora. Parallel evaluations extended the research further to include a variety of languages and media types. In recent years, the TAC QA track evolved into the Knowledge Base Population (KBP) t...

متن کامل

The Multiple Language Question Answering Track at CLEF 2003

This paper reports on the pilot question answering track that was carried out within the CLEF initiative this year. The track was divided into monolingual and bilingual tasks: monolingual systems were evaluated within the frame of three non-English European languages, Dutch, Italian and Spanish, while in the crosslanguage tasks an English document collection constituted the target corpus for It...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003